An Effective Approach for Robust Metric Learning in the Presence of Label Noise
نویسندگان
چکیده مقاله:
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as Euclidean and cosine similarity are not appropriate in many applications, metric learning algorithms are developed which aim to learn an optimal distance function from data. These methods often need training data in the form of pair or triplet sets. Nowadays, this training data popularly is obtained via crowdsourcing from the Internet. Therefore, this information may be contaminated with label noise resulting in the poor performance of the learned metric. In some datasets, even it is possible that the learned metrics perform worse than the general ones such as Euclidean. To address this emerging challenge, we present a new robust metric learning algorithm that can identify outliers and label noise simultaneously from training side information. For this purpose, we model the probability distribution of label noise based on information in the training data. The proposed distribution function efficiently assigns the high probability to the data points contaminated with label noise. On the other hand, its value on the normal instances is near zero. Afterward, we weight the training instances according to these probabilities in our metric learning optimization problem. The proposed optimization problem can be solved using available SVM libraries such as LibSVM efficiently. Note that the proposed approach for identifying data with label noise is general and can easily be applied to any existing metric learning algorithms. After the metric learning phase, we utilized both the weights and the learned metric to enhance the accuracy of the metric-based classifier such as kNN. Several experiments are conducted on both real and synthetic datasets. The results confirm that the proposed algorithm enhances the performance of the learned metric in the presence of label noise and considerably outperforms state-of-the-art peer methods at different noise levels.
منابع مشابه
Robust Distance Metric Learning in the Presence of Label Noise
Many distance learning algorithms have been developed in recent years. However, few of them consider the problem when the class labels of training data are noisy, and this may lead to serious performance deterioration. In this paper, we present a robust distance learning method in the presence of label noise, by extending a previous non-parametric discriminative distance learning algorithm, i.e...
متن کاملthe use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach
abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...
15 صفحه اولTesting the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging
Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...
متن کاملstudy of cohesive devices in the textbook of english for the students of apsychology by rastegarpour
this study investigates the cohesive devices used in the textbook of english for the students of psychology. the research questions and hypotheses in the present study are based on what frequency and distribution of grammatical and lexical cohesive devices are. then, to answer the questions all grammatical and lexical cohesive devices in reading comprehension passages from 6 units of 21units th...
Learning and Evaluation in Presence of Non-i.i.d. Label Noise
In many real-world applications, the simplified assumption of independent and identically distributed noise breaks down, and labels can have structured, systematic noise. For example, in brain-computer interface applications, training data is often the result of lengthy experimental sessions, where the attention levels of participants can change over the course of the experiment. In such applic...
متن کاملdevelopment and implementation of an optimized control strategy for induction machine in an electric vehicle
in the area of automotive engineering there is a tendency to more electrification of power train. in this work control of an induction machine for the application of electric vehicle is investigated. through the changing operating point of the machine, adapting the rotor magnetization current seems to be useful to increase the machines efficiency. in the literature there are many approaches wh...
15 صفحه اولمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 19 شماره 1
صفحات 0- 0
تاریخ انتشار 2022-05
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی برای این مقاله ارائه نشده است
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023